Statistical Machine Translation with Long Phrase Table and without Long Parallel Sentences

نویسندگان

  • Jin'ichi Murakami
  • Masato Tokuhisa
  • Satoru Ikehara
چکیده

In this study, we paid attention to the reliability of phrase table. To make phrase table, We have been used Och’s method[3]. And this method sometimes generate completely wrong phrase table. We found that such phrase table caused by long parallel sentences. Therefore, we removed these long parallel sentences from training data. Also, we utilized general tools for statistical machine translation, such as ”Giza++”[4], ”moses”[5], and ”training-phrase-model.perl”[6]. We obtained a BLEU score of 0.2229 of the Intrinsic-JE task and 0.2393 of the Intrinsic-EJ task for our proposed method. On the other hand, we obtained a BLEU score of 0.2162 of the Intrinsic-JE task and 0.2533 of the Intrinsic-EJ task for a standard method. This means that our proposed method was effective for the Intrinsic-JE task. However, it was not effective for the Intrinsic-EJ tasks. Also, our system was average performance of all system. For example, our system was the 20th place in 34 system for Intrinsic-JE task and the 12th place in 20 system for Intrinsic-EJ task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical machine translation without long parallel sentences for training data

In this study, we paid attention to the reliability of phrase table. We have been used the phrase table using Och’s method[2]. And this method sometimes generate completely wrong phrase tables. We found that such phrase table caused by long parallel sentences. Therefore, we removed these long parallel sentences from training data. Also, we utilized general tools for statistical machine translat...

متن کامل

Statistical machine translation using large j/e parallel corpus and long phrase tables

Our statistical machine translation system that uses large Japanese-English parallel sentences and long phrase tables is described. We collected 698,973 Japanese-English parallel sentences, and we used long phrase tables. Also, we utilized general tools for statistical machine translation, such as ”Giza++”[1], ”moses”[2], and ”training-phrasemodel.perl”[3]. We used these data and these tools, W...

متن کامل

Collecting Bilingual Technical Terms from Patent Families of Character-Segmented Chinese Sentences and Morpheme-Segmented Japanese Sentences

In manual translation of patent documents, a technical term bilingual lexicon is inevitable for a translator to efficiently translate patent documents. Dong et al. (2015) proposed a method of generating bilingual technical term lexicon from morpheme-segmented parallel patent sentences. The proposed method estimates Japanese-Chinese translation of technical terms using the phrase translation tab...

متن کامل

Joint Phrase Alignment and Extraction for Statistical Machine Translation

The phrase table, a scored list of bilingual phrases, lies at the center of phrase-based machine translation systems. We present a method to directly learn this phrase table from a parallel corpus of sentences that are not aligned at the word level. The key contribution of this work is that while previous methods have generally only modeled phrases at one level of granularity, in the proposed m...

متن کامل

A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation

We compare two pivot strategies for phrase-based statistical machine translation (SMT), namely phrase translation and sentence translation. The phrase translation strategy means that we directly construct a phrase translation table (phrase-table) of the source and target language pair from two phrase-tables; one constructed from the source language and English and one constructed from English a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008